<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Shingle token filter | ElasticSearch 7.7 权威指南中文版</title>
	<meta name="keywords" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <meta name="description" content="ElasticSearch 权威指南中文版, elasticsearch 7, es7, 实时数据分析，实时数据检索" />
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
	<script>
	var _link = 'analysis-shingle-tokenfilter.html';
    </script>
</head>
<body>
<div class="main-container">
    <section id="content">
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-shingle-tokenfilter.html" rel="nofollow" target="_blank">https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-shingle-tokenfilter.html</a>, 原文档版权归 www.elastic.co 所有<br/>本地英文版地址: <a href="../en/analysis-shingle-tokenfilter.html" rel="nofollow" target="_blank">../en/analysis-shingle-tokenfilter.html</a></div>
                        <!-- start body -->
                  <div class="page_header">
<strong>重要</strong>: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html" rel="nofollow">当前版本文档</a>。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch Guide [7.7]</a></span>
»
<span class="breadcrumb-link"><a href="analysis.html">Text analysis</a></span>
»
<span class="breadcrumb-link"><a href="analysis-tokenfilters.html">Token filter reference</a></span>
»
<span class="breadcrumb-node">Shingle token filter</span>
</div>
<div class="navheader">
<span class="prev">
<a href="analysis-reverse-tokenfilter.html">« Reverse token filter</a>
</span>
<span class="next">
<a href="analysis-snowball-tokenfilter.html">Snowball token filter »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="analysis-shingle-tokenfilter"></a>Shingle token filter<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/shingle-tokenfilter.asciidoc">edit</a>
</h2>
</div></div></div>

<p>Add shingles, or word <a href="https://en.wikipedia.org/wiki/N-gram" class="ulink" target="_top">n-grams</a>, to a token
stream by concatenating adjacent tokens. By default, the <code class="literal">shingle</code> token filter
outputs two-word shingles and unigrams.</p>
<p>For example, many tokenizers convert <code class="literal">the lazy dog</code> to <code class="literal">[ the, lazy, dog ]</code>. You
can use the <code class="literal">shingle</code> filter to add two-word shingles to this stream:
<code class="literal">[ the, the lazy, lazy, lazy dog, dog ]</code>.</p>
<div class="tip admon">
<div class="icon"></div>
<div class="admon_content">
<p>Shingles are often used to help speed up phrase queries, such as
<a class="xref" href="query-dsl-match-query-phrase.html" title="Match phrase query"><code class="literal">match_phrase</code></a>. Rather than creating shingles
using the <code class="literal">shingles</code> filter, we recommend you use the
<a class="xref" href="index-phrases.html" title="index_phrases"><code class="literal">index-phrases</code></a> mapping parameter on the appropriate
<a class="xref" href="text.html" title="Text datatype">text</a> field instead.</p>
</div>
</div>
<p>This filter uses Lucene’s
<a href="https://lucene.apache.org/core/8_5_1/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html" class="ulink" target="_top">ShingleFilter</a>.</p>
<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-shingle-tokenfilter-analyze-ex"></a>Example<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/shingle-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>The following <a class="xref" href="indices-analyze.html" title="Analyze API">analyze API</a> request uses the <code class="literal">shingle</code>
filter to add two-word shingles to the token stream for <code class="literal">quick brown fox jumps</code>:</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET /_analyze
{
  "tokenizer": "whitespace",
  "filter": [ "shingle" ],
  "text": "quick brown fox jumps"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/978.console"></div>
<p>The filter produces the following tokens:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ quick, quick brown, brown, brown fox, fox, fox jumps, jumps ]</pre>
</div>
<p>To produce shingles of 2-3 words, add the following arguments to the analyze API
request:</p>
<div class="ulist itemizedlist">
<ul class="itemizedlist">
<li class="listitem">
<code class="literal">min_shingle_size</code>: <code class="literal">2</code>
</li>
<li class="listitem">
<code class="literal">max_shingle_size</code>: <code class="literal">3</code>
</li>
</ul>
</div>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET /_analyze
{
  "tokenizer": "whitespace",
  "filter": [
    {
      "type": "shingle",
      "min_shingle_size": 2,
      "max_shingle_size": 3
    }
  ],
  "text": "quick brown fox jumps"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/979.console"></div>
<p>The filter produces the following tokens:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ quick, quick brown, quick brown fox, brown, brown fox, brown fox jumps, fox, fox jumps, jumps ]</pre>
</div>
<p>To only include shingles in the output, add an <code class="literal">output_unigrams</code> argument of
<code class="literal">false</code> to the request.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET /_analyze
{
  "tokenizer": "whitespace",
  "filter": [
    {
      "type": "shingle",
      "min_shingle_size": 2,
      "max_shingle_size": 3,
      "output_unigrams": false
    }
  ],
  "text": "quick brown fox jumps"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/980.console"></div>
<p>The filter produces the following tokens:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ quick brown, quick brown fox, brown fox, brown fox jumps, fox jumps ]</pre>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-shingle-tokenfilter-analyzer-ex"></a>Add to an analyzer<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/shingle-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>The following <a class="xref" href="indices-create-index.html" title="Create index API">create index API</a> request uses the
<code class="literal">shingle</code> filter to configure a new <a class="xref" href="analysis-custom-analyzer.html" title="Create a custom analyzer">custom
analyzer</a>.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "standard_shingle": {
          "tokenizer": "standard",
          "filter": [ "shingle" ]
        }
      }
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/981.console"></div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-shingle-tokenfilter-configure-parms"></a>Configurable parameters<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/shingle-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<div class="variablelist">
<dl class="variablelist">
<dt>
<span class="term">
<code class="literal">max_shingle_size</code>
</span>
</dt>
<dd>
<p>
(Optional, integer)
Maximum number of tokens to concatenate when creating shingles. Defaults to <code class="literal">2</code>.
</p>
<div class="note admon">
<div class="icon"></div>
<div class="admon_content">
<p>This value cannot be lower than the <code class="literal">min_shingle_size</code> argument, which
defaults to <code class="literal">2</code>. The difference between this value and the <code class="literal">min_shingle_size</code>
argument cannot exceed the <a class="xref" href="index-modules.html#index-max-shingle-diff"><code class="literal">index.max_shingle_diff</code></a>
index-level setting, which defaults to <code class="literal">3</code>.</p>
</div>
</div>
</dd>
<dt>
<span class="term">
<code class="literal">min_shingle_size</code>
</span>
</dt>
<dd>
<p>
(Optional, integer)
Minimum number of tokens to concatenate when creating shingles. Defaults to <code class="literal">2</code>.
</p>
<div class="note admon">
<div class="icon"></div>
<div class="admon_content">
<p>This value cannot exceed the <code class="literal">max_shingle_size</code> argument, which defaults
to <code class="literal">2</code>. The difference between the <code class="literal">max_shingle_size</code> argument and this value
cannot exceed the <a class="xref" href="index-modules.html#index-max-shingle-diff"><code class="literal">index.max_shingle_diff</code></a>
index-level setting, which defaults to <code class="literal">3</code>.</p>
</div>
</div>
</dd>
<dt>
<span class="term">
<code class="literal">output_unigrams</code>
</span>
</dt>
<dd>
(Optional, boolean)
If <code class="literal">true</code>, the output includes the original input tokens. If <code class="literal">false</code>, the output
only includes shingles; the original input tokens are removed. Defaults to
<code class="literal">true</code>.
</dd>
<dt>
<span class="term">
<code class="literal">output_unigrams_if_no_shingles</code>
</span>
</dt>
<dd>
<p>
If <code class="literal">true</code>, the output includes the original input tokens only if no shingles are
produced; if shingles are produced, the output only includes shingles. Defaults
to <code class="literal">false</code>.
</p>
<div class="important admon">
<div class="icon"></div>
<div class="admon_content">
<p>If both this and the <code class="literal">output_unigrams</code> parameter are <code class="literal">true</code>, only the
<code class="literal">output_unigrams</code> argument is used.</p>
</div>
</div>
</dd>
<dt>
<span class="term">
<code class="literal">token_separator</code>
</span>
</dt>
<dd>
(Optional, string)
Separator used to concatenate adjacent tokens to form a shingle. Defaults to a
space (<code class="literal">" "</code>).
</dd>
<dt>
<span class="term">
<code class="literal">filler_token</code>
</span>
</dt>
<dd>
<p>(Optional, string)
String used in shingles as a replacement for empty positions that do not contain
a token. This filler token is only used in shingles, not original unigrams.
Defaults to an underscore (<code class="literal">_</code>).</p>
<p>Some token filters, such as the <code class="literal">stop</code> filter, create empty positions when
removing stop words with a position increment greater than one.</p>
<details>
<summary class="title"><span class="strong strong"><strong>Example</strong></span></summary>
<div class="content">
<p>In the following <a class="xref" href="indices-analyze.html" title="Analyze API">analyze API</a> request, the <code class="literal">stop</code> filter
removes the stop word <code class="literal">a</code> from <code class="literal">fox jumps a lazy dog</code>, creating an empty
position. The subsequent <code class="literal">shingle</code> filter replaces this empty position with a
plus sign (<code class="literal">+</code>) in shingles.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">GET /_analyze
{
  "tokenizer": "whitespace",
  "filter": [
    {
      "type": "stop",
      "stopwords": [ "a" ]
    },
    {
      "type": "shingle",
      "filler_token": "+"
    }
  ],
  "text": "fox jumps a lazy dog"
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/982.console"></div>
<p>The filter produces the following tokens:</p>
<div class="pre_wrapper lang-text">
<pre class="programlisting prettyprint lang-text">[ fox, fox jumps, jumps, jumps +, + lazy, lazy, lazy dog, dog ]</pre>
</div>
</div>
</details>
</dd>
</dl>
</div>
</div>

<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="analysis-shingle-tokenfilter-customize"></a>Customize<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elastic/elasticsearch/edit/7.7/docs/reference/analysis/tokenfilters/shingle-tokenfilter.asciidoc">edit</a>
</h3>
</div></div></div>
<p>To customize the <code class="literal">shingle</code> filter, duplicate it to create the basis for a new
custom token filter. You can modify the filter using its configurable
parameters.</p>
<p>For example, the following <a class="xref" href="indices-create-index.html" title="Create index API">create index API</a> request
uses a custom <code class="literal">shingle</code> filter, <code class="literal">my_shingle_filter</code>, to configure a new
<a class="xref" href="analysis-custom-analyzer.html" title="Create a custom analyzer">custom analyzer</a>.</p>
<p>The <code class="literal">my_shingle_filter</code> filter uses a <code class="literal">min_shingle_size</code> of <code class="literal">2</code> and a
<code class="literal">max_shingle_size</code> of <code class="literal">5</code>, meaning it produces shingles of 2-5 words.
The filter also includes a <code class="literal">output_unigrams</code> argument of <code class="literal">false</code>, meaning that
only shingles are included in the output.</p>
<div class="pre_wrapper lang-console">
<pre class="programlisting prettyprint lang-console">PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "en": {
          "tokenizer": "standard",
          "filter": [ "my_shingle_filter" ]
        }
      },
      "filter": {
        "my_shingle_filter": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 5,
          "output_unigrams": false
        }
      }
    }
  }
}</pre>
</div>
<div class="console_widget" data-snippet="snippets/983.console"></div>
</div>

</div>
<div class="navfooter">
<span class="prev">
<a href="analysis-reverse-tokenfilter.html">« Reverse token filter</a>
</span>
<span class="next">
<a href="analysis-snowball-tokenfilter.html">Snowball token filter »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>